218 research outputs found
Silent speech: restoring the power of speech to people whose larynx has been removed
Every year, some 17,500 people in Europe and North America lose the power of speech after undergoing a laryngectomy, normally as a treatment for throat cancer. Several research groups have recently demonstrated that it is possible to restore speech to these people by using machine learning to learn the transformation from articulator movement to sound. In our project articulator movement is captured by a technique developed by our collaborators at Hull University called Permanent Magnet Articulography (PMA), which senses the changes of magnetic field caused by movements of small magnets attached to the lips and tongue. This solution, however, requires synchronous PMA-and-audio recordings for learning the transformation and, hence, it cannot be applied to people who have already lost their voice. Here we propose to investigate a variant of this technique in which the PMA data are used to drive an articulatory synthesiser, which generates speech acoustics by simulating the airflow through a computational model of the vocal tract. The project goals, participants, current status, and achievements of the project are discussed below.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Reporting back environmental exposure data and free choice learning.
Reporting data back to study participants is increasingly being integrated into exposure and biomonitoring studies. Informal science learning opportunities are valuable in environmental health literacy efforts and report back efforts are filling an important gap in these efforts. Using the University of Arizona's Metals Exposure Study in Homes, this commentary reflects on how community-engaged exposure assessment studies, partnered with data report back efforts are providing a new informal education setting and stimulating free-choice learning. Participants are capitalizing on participating in research and leveraging their research experience to meet personal and community environmental health literacy goals. Observations from report back activities conducted in a mining community support the idea that reporting back biomonitoring data reinforces free-choice learning and this activity can lead to improvements in environmental health literacy. By linking the field of informal science education to the environmental health literacy concepts, this commentary demonstrates how reporting data back to participants is tapping into what an individual is intrinsically motivated to learn and how these efforts are successfully responding to community-identified education and research needs
A silent speech system based on permanent magnet articulography and direct synthesis
In this paper we present a silent speech interface (SSI) system aimed at restoring speech communication for individuals who have lost their voice due to laryngectomy or diseases affecting the vocal folds. In the proposed system, articulatory data captured from the lips and tongue using permanent magnet articulography (PMA) are converted into audible speech using a speaker-dependent transformation learned from simultaneous recordings of PMA and audio signals acquired before laryngectomy. The transformation is represented using a mixture of factor analysers, which is a generative model that allows us to efficiently model non-linear behaviour and perform dimensionality reduction at the same time. The learned transformation is then deployed during normal usage of the SSI to restore the acoustic speech signal associated with the captured PMA data. The proposed system is evaluated using objective quality measures and listening tests on two databases containing PMA and audio recordings for normal speakers. Results show that it is possible to reconstruct speech from articulator movements captured by an unobtrusive technique without an intermediate recognition step. The SSI is capable of producing speech of sufficient intelligibility and naturalness that the speaker is clearly identifiable, but problems remain in scaling up the process to function consistently for phonetically rich vocabularies
Non-Parallel Articulatory-to-Acoustic Conversion Using Multiview-based Time Warping
This work was supported in part by the Spanish State Research Agency (SRA) grant
number PID2019-108040RB-C22/SRA/10.13039/501100011033, and the FEDER/Junta de AndalucíaConsejería de Transformación Económica, Industria, Conocimiento y Universidades project no.
B-SEJ-570-UGR20.In this paper, we propose a novel algorithm called multiview temporal alignment by dependence maximisation in the latent space (TRANSIENCE) for the alignment of time series consisting of sequences of feature vectors with different length and dimensionality of the feature vectors. The proposed algorithm, which is based on the theory of multiview learning, can be seen as an extension of the well-known dynamic time warping (DTW) algorithm but, as mentioned, it allows the sequences to have different dimensionalities. Our algorithm attempts to find an optimal temporal alignment between pairs of nonaligned sequences by first projecting their feature vectors into a common latent space where both views are maximally similar. To do this, powerful, nonlinear deep neural network (DNN) models are employed. Then, the resulting sequences of embedding vectors are aligned using DTW. Finally, the alignment paths obtained in the previous step are applied to the original sequences to align them. In the paper, we explore several variants of the algorithm that mainly differ in the way the DNNs are trained. We evaluated the proposed algorithm on a articulatory-to-acoustic (A2A) synthesis task involving the generation of audible speech from motion data captured from the lips and tongue of healthy speakers using a technique known as permanent magnet articulography (PMA). In this task, our algorithm is applied during the training stage to align pairs of nonaligned speech and PMA recordings that are later used to train DNNs able to synthesis speech from PMA data. Our results show the quality of speech generated in the nonaligned scenario is comparable to that obtained in the parallel scenario.Spanish State Research Agency (SRA) PID2019-108040RB-C22/SRA/10.13039/501100011033FEDER/Junta de AndalucíaConsejería de Transformación Económica, Industria, Conocimiento y Universidades project no.
B-SEJ-570-UGR20
Integrating user-centred design in the development of a silent speech interface based on permanent magnetic articulography
Abstract: A new wearable silent speech interface (SSI) based on Permanent Magnetic Articulography (PMA) was developed with the involvement of end users in the design process. Hence, desirable features such as appearance, port-ability, ease of use and light weight were integrated into the prototype. The aim of this paper is to address the challenges faced and the design considerations addressed during the development. Evaluation on both hardware and speech recognition performances are presented here. The new prototype shows a com-parable performance with its predecessor in terms of speech recognition accuracy (i.e. ~95% of word accuracy and ~75% of sequence accuracy), but significantly improved appearance, portability and hardware features in terms of min-iaturization and cost
Multi-view Temporal Alignment for Non-parallel Articulatory-to-Acoustic Speech Synthesis
Articulatory-to-acoustic (A2A) synthesis refers to the generation of audible speech from captured movement of the speech articulators. This technique has numerous applications, such as restoring oral communication to people who cannot longer speak due to illness or injury. Most successful techniquesso far adopt a supervised learning framework, in which timesynchronousarticulatory-and-speech recordings are used to train a supervised machine learning algorithm that can be used later to map articulator movements to speech. This, however, prevents the application of A2A techniques in cases where parallel data is unavailable, e.g., a person has already lost her/his voice and only articulatory data can be captured. In this work, we propose a solution to this problem based on the theory of multi-view learning. The proposed algorithm attempts to find an optimal temporal alignment between pairs of nonaligned articulatory-and-acoustic sequences with the same phonetic content by projecting them into a common latent space where both views are maximally correlated and then applying dynamic time warping. Several variants of this idea are discussed and explored. We show that the quality of speech generated in the non-aligned scenario is comparable to that obtained in the parallel scenario.This work was funded by the Spanish State Research Agency (SRA) under the grant PID2019-108040RBC22/
SRA/10.13039/501100011033. Jose A. Gonzalez-Lopez holds a Juan de la Cierva-Incorporation Fellowship from the Spanish Ministry
of Science, Innovation and Universities (IJCI-2017-32926)
Cascaded- and Modular-Multilevel Converter Laboratory Test System Options: A Review
The increasing importance of cascaded multilevel converters (CMCs), and the sub-category of modular multilevel converters (MMCs), is illustrated by their wide use in high voltage DC connections and in static compensators. Research is being undertaken into the use of these complex pieces of hardware and software for a variety of grid support services, on top of fundamental frequency power injection, requiring improved control for non-traditional duties. To validate these results, small-scale laboratory hardware prototypes are often required. Such systems have been built by many research teams around the globe and are also increasingly commercially available. Few publications go into detail on the construction options for prototype CMCs, and there is a lack of information on both design considerations and lessons learned from the build process, which will hinder research and the best application of these important units. This paper reviews options, gives key examples from leading research teams, and summarizes knowledge gained in the development of test rigs to clarify design considerations when constructing laboratory-scale CMCs.This work was supported in part by The University of Manchester supported by the National Innovation Allowance project ``VSC-HVDC Model Validation and Improvement'' and Dr. Heath's iCASE Ph.D. studentship supported through Engineering and Physical Sciences Research Council (EPSRC) and National Grid, in part by the Imperial College London supported by EPSRC through the HubNet Extension under Grant EP/N030028/1, in part by an iCASE Ph.D. Studentship supported by EPSRC and EDF Energy and the CDT in Future Power Networks under Grant EP/L015471/1, in part by University of New South Wales (UNSW) supported by the Solar Flagships Program through the Education Infrastructure Fund (EIF), in part by the Australian Research Council through the Discovery Early Career Research Award under Grant DECRA_DE170100370, in part by the Basque Government through the project HVDC-LINK3 under Grant ELKARTEK KK-2017/00083, in part by the L2EP research group at the University of Lille supported by the French TSO (RTE), and in part by the Hauts-de-France region of France with the European Regional Development Fund under Grant FEDER 17007725
Real-Time Detection and Rapid Multiwavelength Follow-up Observations of a Highly Subluminous Type II-P Supernova from the Palomar Transient Factory Survey
The Palomar Transient Factory (PTF) is an optical wide-field variability
survey carried out using a camera with a 7.8 square degree field of view
mounted on the 48-in Oschin Schmidt telescope at Palomar Observatory. One of
the key goals of this survey is to conduct high-cadence monitoring of the sky
in order to detect optical transient sources shortly after they occur. Here, we
describe the real-time capabilities of the PTF and our related rapid
multiwavelength follow-up programs, extending from the radio to the gamma-ray
bands. We present as a case study observations of the optical transient
PTF10vdl (SN 2010id), revealed to be a very young core-collapse (Type II-P)
supernova having a remarkably low luminosity. Our results demonstrate that the
PTF now provides for optical transients the real-time discovery and
rapid-response follow-up capabilities previously reserved only for high-energy
transients like gamma-ray bursts.Comment: ApJ, in press; all spectroscopic data available from the Weizmann
Institute of Science Experimental Astrophysics Spectroscopy System (WISEASS;
http://www.weizmann.ac.il/astrophysics/wiseass/
- …